Motivation
Four general principles
Case study
Costs and benefits
25 May 2016
Motivation
Four general principles
Case study
Costs and benefits
Motivation
Universalism
'Communism'
Disinterestedness
Organized skepticism
Robert Boyle's vacuum pump
Documentation
'Communal witnessing'
Circumstances
Empirical Reproducibility
Computational Reproducibility
Statistical Reproducibility
Computational Biology
Computational Physics
Computational Chemistry
Computational Economics
Computational …
Reproducibilty is necessary for scientific progress
Computers wrangle all the data, but also obscure it
Especially point-and-click actions
Technical solutions available in open source/format/data/access
Four general principles of reproducible research that have emerged across the sciences
✓ Plain text file formats
✓ persistent URLs
Victoria Stodden's Reproducible Research Standard
✓ Data: CC-0 (public domain)
✓ Code: MIT (no liability for reuse)
✓ Text/Figures/Media: CC-BY (attribution required)
✗ Mouse gestures leave few traces that are enduring and accessible to others
✗ Easy to lose track of ah hoc changes in mouse-driven environments
✓ Scripts for data ingest, cleaning, analysis, visualizing, and reporting
✓ Scripts create a very high-resolution record of the research workflow in a plain text file that can be reused and inspected by others
✗ Managing different versions of computer files is very challenging
✗ Poor version control leads to loosing track of the provenance of results
✓ VCS designed for software engineering are suitable for research code and text
✓ Commit history preserves a high-resolution, transparent record of the development of a file or set of files
✓ Enables remote collaborators to work together without overwriting each other’s work
✗ Minor changes in software can cripple complex research pipelines
✗ Managing software dependencies is tedious
✓ List of the key pieces software and their version numbers
✓ Archive a self-contained computational environment like a virtual machine or Linux container
Case Study
All files on figshare.com
Data in CSV format
Organised as an R package
R & Rmarkdown documents
All files tracked with Git, hosted on GitHub
Collaboration did not occur on GitHub because no co-authors used it
Docker image and Dockerfile to contain RStudio, packages, code and external dependencies
Based on Rocker image and templates
"research compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data,…), and as a means for distributing, managing and updating the collection."
README.md
R package
Version control
Environment
Costs & benefits
Time learning the tools
That's all
Built-in vs Bolt-on
Comfort of knowing that I am right & have no secrets
Save time by reusing my previous code
Open data confers citation advantages, but magnitude is highly variable
Open Source community membership provides access to high-quality help
Open methods and materials, scripted workflow, version control and environment control are generic principles suitable for most fields of research
The specific tools will change over time, but the principles will endure
For most people, the technical problems already have good solutions, the remaining challenge is cultural (eg. syllabi & peer reviews)
Presentation written in R Markdown using ioslides
Compiled into HTML5 using RStudio & knitr
Source code hosting: https://github.com/benmarwick/UOW-NIASRA-2016-talk
ORCID: http://orcid.org/0000-0001-7879-4531
Licensing: